In this script, we’ll introduce the basic approach to Resource Selection Function (RSF) modeling. We’ll integrate the wildebeest dataset that we’ve been using during the past few lectures and conduct a point-based analysis to evaluate habitat suitability. This vignette is meant as a first-step towards more complex models which incorporate, rather than ignore, the animal movement process. We will be relying primarily on the functionality provided for RSF analysis in the package amt. We’ll address the following objectives here:
We’ll be following details provided in the following publications:
Fieberg, J., J. Signer, B. Smith, and T. Avgar. 2021. A ‘How to’ guide for interpreting parameters in habitat‐selection analyses. Journal of Animal Ecology 00:1-17.
Signer, J., J. Fieberg, and T. Avgar. 2019. Animal movement tools (amt): R package for managing tracking data and conducting habitat selection analyses. Ecology and evolution 9(2): 880-890.
Stabach, J.A., G. Wittemyer, R.B. Boone, R.S. Reid, J.S. Worden. 2016. Variation in habitat selection by white-bearded wildebeest across different degrees of human disturbance. Ecosphere 7(8).
Load libraries and setup your workspace for analyses.
# Remove everything from memory
rm(list=ls())
# Set.seed - This function means that any generation of "random" points/numbers will be the same each time you run the script.
set.seed(533)
# You may need to install these packages first
#install.packages('amt', 'tidyverse', 'lme4', 'terra', 'sf', 'sjPlot', 'visreg', 'usdm', 'tmap', 'jtools')
# Load libraries
library(amt)
library(tidyverse)
library(lme4)
library(terra)
library(sf)
library(sjPlot)
library(visreg)
library(usdm)
library(tmap)
library(jtools)
# Set all tmaps to plot in view mode
tmap_mode("view")
Read in all your spatial data. The animal movement data is a ‘flat’ dataframe (a non-projected file) with coordinates that have been extracted in UTM 37 S, WGS84 (previous exercise). The raster layers are projected to Albers Equal Area. We will need to project the point data to match the raster data for analyses.
Note, the projection used for analyses doesn’t matter too much, as long as you are consistent between the data layers (point and raster) you are using. It is important, however, to use a projection that minimizes the amount of distortion across your study area.
# Load a polygon shapefile of the study area. File is project to Albers Equal Area. This will be our prediction area.
Athi.Bound <- st_read("Data/Athi.shp")
## Reading layer `Athi' from data source
## `D:\GitHub\Courses\IntroductionToAnimalMovements\Wildebeest_RSF\China_Course\Data\Athi.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 1203934 ymin: -230250.7 xmax: 1256094 ymax: -160721.6
## Projected CRS: Africa_Albers_Equal_Area_Conic
# Load all raster layers for Athi-Kaputiei Plains study area. The raster stack consists of 7 raster layers. To be in a stack they all have to have the exact same resolution (250-m) and spatial extent.
rsf.stack <- terra::rast("data/ak_raster_stack.tif")
plot(rsf.stack)
# Data Layers included:
# anth_risk - Anthropogenic Risk, simply an index of human footprint made specifically for this ecosystem (expected negative response)
# Fence_dist - Fence Distance, with fences manually mapped by a Kenyan field team (expected negative response)
# prirds_dist - Primary Road Distance, the distance from primary/paved roads (expected negative response)
# secrds_dist - Secondary Road Distance, the distance from secondary/unpaved roads (expected null response)
# river_dist - River Distance, the distance from permanent rivers (depending on season, but wildebeest must drink - greater attraction in dry season, but a predation risk)
# waterpts_dist - Water Point Distance, the distance to mapped water wells (expected positive response, stronger in dry season)
# woody_dist -Woody Distance, the distance to woody vegetation (based on VCF) (expected negative response - predation risk)
# The fence plot is hard to interpret because there are so many low values in the center of the plot. This means there are a lot of fences there. Let's color the distance values and only show the first 100 to better see where the fences are located.
# plot(rsf.stack$fence_dist)
plot(rsf.stack$fence_dist,
range = c(0, 100),
col = "black")
# We can review the range of values in each raster like this.
# summary(values(rsf.stack$fence_dist))
# What is the crs of the raster stack?
crs(rsf.stack, proj=TRUE)
## [1] "+proj=aea +lat_0=0 +lon_0=25 +lat_1=20 +lat_2=-23 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
# *******************************
# Now let's import the GPS tracking dataset.
# This is the 3-hour, regular trajectory.
# Let's import and remove the trajectory information
# We also need to remove the na values in the x/y coordinates
WB.data <- read_rds("Data/wildebeest_3hr_adehabitat.rds") %>%
select(x,
y,
date,
id,
sex) %>%
filter(!is.na(x),
!is.na(y),
!is.na(date))
# Is this a spatial object?
#class(WB.data)
# Is it projected?
#st_crs(WB.data)
# What is the timezone?
#tz(WB.data$date)
# Here's where we need to be careful.
# Our raster layers are projected to Albers Equal Area. Our point layer is not projected, but the x/y coordinates are derived from when the layer was projected to UTM 37S, WGS84.
# This means we should sent the file to UTM37S, WGS84 and project to Albers Equal Area
WB.data.sf <- WB.data %>%
st_as_sf(coords = c('x', 'y'),
crs = "EPSG:32737") %>% # This is UTM 37S, WGS84 (UtmZone.proj <- "EPSG:32737")
st_transform(crs(rsf.stack)) # Easiest way to set the projection is grab it from the crs of our already defined raster stack
# The specific project is Albers Equal Area, specific to Africa. This would be defined as:
#AEA.Africa.proj <- "+proj=aea +lat_1=20 +lat_2=-23 +lat_0=0 +lon_0=25 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
# Now what's the CRS?
# st_crs(WB.data.sf)
# Let's plot to make sure everything is aligned.
# Here I'm choosing to plot with a sequential palette with 10 colors on the ramp. You can run tmaptools::palette_explorer() to explore the palette options.
tm_shape(rsf.stack$anth_risk,
name = "Human Risk Level") +
tm_raster(palette = "YlOrRd", n = 10,
alpha = 0.6,
title = "Anth. Risk") +
tm_shape(WB.data.sf %>%
filter(id == "Noontare"),
name = "Noontare Locations") +
tm_dots(size = 0.01,
col = "black") +
tm_shape(Athi.Bound,
name = "Athi-Kaputiei Plains") +
tm_polygons(alpha = 0,
border.col = "green") +
tm_layout("Example Graph")
# This first raster represents anthropogenic risk. And you can see that this wildebeest appears to be responding to this variable. If you zoom into the northern locations, as there are no locations in the areas of elevated human risk.
# Let's plot one more. Let's look at this wildebeest's points with distance to primary road. Here I want the low values to be red, instead of the high values, so I'm reversing the palette with the "-". I'm also including a lot more colors, otherwise it was difficult to see where the roads actually were. This makes the legend too large though so I'm hiding the legend for the raster.
# tm_shape(rsf.stack$prirds_dist,
# name = "Distance to Primary Rd") +
# tm_raster(palette = "-YlOrRd", n = 30,
# alpha = 0.6,
# legend.show = FALSE) +
# tm_shape(WB.data.sf %>%
# filter(id == "Noontare"),
# name = "Noontare Locations") +
# tm_dots(size = 0.01,
# col = "gray")
# Check tmap_options()
# This is where the default basemaps are set. This is why we see ESRI.WorldGrayCanvas, OpenStreetMap, and ESRI.WorldTopoMap
Here will take the initial steps to fit a RSF to a single individual using the amt package.
Amt relies heavily on piping, which has a bit of a (steep) learning curve, but can help in streamlining the data processing workflow. We’ll start by creating a trajectory (called a track in amt).
Remember also that RSF analyses assume statistical independence between locations and habitat covariates. We could provide a very conservative estimate of time to independence using ctmm and then resample our data to some interval greater than this time scale. This will surely reduce our sample size. As an example, we will thin the data heavily, sampling 20% of the data randomly for analyses.
# Create Subset of dataset, filtering to 1 animal
nt <- WB.data.sf %>%
mutate(x = st_coordinates(WB.data.sf)[ ,1],
y = st_coordinates(WB.data.sf)[ ,2]) %>%
as_tibble() %>%
select(-geometry) %>%
filter(id == "Ntishya") %>%
mutate(id = droplevels(id))
# Create Movement Track (similar to ltraj function in adehabitatLT)
# We don't really need the time component for this initial analysis. That would be a step-selection function (ssf) analysis. But, we'll include here as an example.
nt.trk <- mk_track(nt,
.x = x,
.y = y,
.t = date,
crs = crs(WB.data.sf), # Alternative just include crs = 9822. 9822 is the EPSG code for AEA.
order_by_ts = T,
id = id,
sex = sex)
# Note that amt does not calculate steplengths or turning angles when you make the track. To do so, you'd need to specify those variables:
# nt.trk.test <- nt.trk %>%
# mutate(
# sl = step_lengths(.),
# ta = direction_abs(.) # absolute and relative turning angles can be calculated (direction_rel())
# )
# Remember also that the track should be kept regular, otherwise we have potential to bias our results (e.g., more points in the day than at night). Our track has already been resampled to a 3 hour interval in a previous lecture
# In AMT, we would do this by using the track_resample command:
nt.trk.rs <- track_resample(nt.trk,
rate = hours(3),
tolerance = minutes(20))
# Let's thin the data as an example
rcd.amt <- ceiling(nrow(nt.trk.rs)*0.20) # ceiling just rounds up to a whole number
nt.trk.rs <- nt.trk.rs[sample(nrow(nt.trk.rs),size = rcd.amt),c("id","x_","y_","id","sex")]
Perhaps the most important aspect of any habitat selection analysis is the assessment of availability. In a RSF, the assumption is that the distribution of available habitat is constant through time and that the animal has equal access to ALL areas within a user-defined area (Type III Habitat Selection). This assumption may hold true when positions are observed infrequently (the case more commonly when data were collected infrequently from VHF studies), but less so with many modern telemetry studies.
The goal is to contrast points where an animal was observed with
locations that were potentially available. To do so, we will create
pseudo-absence points generated throughout an animals’ home range. The
amt
function random_points is a convenient way to do this, with
various options for generating the area in which sample.
# Here we will generate 10 times the number of "Use" points to get our list of "available" points. We'll start with this level to begin to inspect our data and look for initial patterns. Before running our final models we'll assess what a robust minimum # of available points should be.
# Create simple boundary of points
hr <- hr_mcp(nt.trk.rs, levels = 1) # Amt has multiple options, including hr_akde(). Levels indicates the isopleth (Here: 100%). We could also upload our own polygon.
# Generate random points within the defined polygon. Important to include the type (random or regular) that we want to be generated.
nt.rsf.10 <- random_points(hr,
n = nrow(nt.trk.rs) * 10,
type ="regular",
presence = nt.trk.rs)
# What does our dataset look like now?
head(nt.rsf.10)
## # A tibble: 6 × 3
## case_ x_ y_
## * <lgl> <dbl> <dbl>
## 1 FALSE 1245978. -224489.
## 2 FALSE 1246144. -224489.
## 3 FALSE 1246309. -224489.
## 4 FALSE 1245151. -224324.
## 5 FALSE 1245316. -224324.
## 6 FALSE 1245482. -224324.
# Cool, the function created a 'case_' field, tracking our 'Use' points (TRUE) and our 'Availability' points (FALSE)
# Summarize
table(nt.rsf.10$case_)
##
## FALSE TRUE
## 11813 1181
#class(nt.rsf.10)
# Plot Use/Availability
plot(nt.rsf.10)
Now that our ‘Use’/‘Availability’ dataset has been constructed, we
will extract the value of each raster at these points. We will use the
function extract_covariates() which is simply a wrapper
function in amt
that uses the terra::extract() function you are now
familiar with.
# Extract all the raster variables at Use/Available points.
nt.rsf.10 <- nt.rsf.10 %>%
extract_covariates(rsf.stack)
# Look at result
head(nt.rsf.10)
## # A tibble: 6 × 10
## case_ x_ y_ anth_risk fence_dist prirds_dist secrds_dist river_dist
## * <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FALSE 1245978. -2.24e5 5.39 8349. 4641. 7956. 2054.
## 2 FALSE 1246144. -2.24e5 6.80 8473. 4772. 8023. 1822.
## 3 FALSE 1246309. -2.24e5 12.6 8602. 4891. 8096. 1583.
## 4 FALSE 1245151. -2.24e5 10.1 8012. 4302. 7778. 2694.
## 5 FALSE 1245316. -2.24e5 14.0 8118. 4406. 7834. 2484.
## 6 FALSE 1245482. -2.24e5 14.0 8118. 4406. 7834. 2484.
## # ℹ 2 more variables: waterpts_dist <dbl>, woody_dist <dbl>
#names(nt.rsf.10)
# Now let's summarize these results to get a sense of how the values compare between 'use' and 'available' locations
nt.rsf.10 %>%
ggplot(aes(y = anth_risk,
col = case_)) +
geom_boxplot() +
labs(title = "Anthropogenic Risk")
# You may have many covariates, so doing this in a loop makes sense. Instead using a 'for' loop, we'll use the 'map' function. See 'help(map)'. First we need to make a vector of variable names that we want to plot.
vars <- nt.rsf.10 %>%
select(anth_risk:woody_dist) %>%
names()
# The, we simply use 'map' to create a plot for each var. the '.x' is like our i in our 'for' loop.
box.plots <- map(vars, ~
nt.rsf.10 %>%
select(case_, var = .x) %>%
ggplot(aes(y = var,
col = case_)) +
geom_boxplot() +
labs(title = .x))
# We can put them all together using the cowplot
cowplot::plot_grid(plotlist = box.plots)
Collinear predictor (indepedent) variables can influence the variance estimates of the model and make it difficult to interpret model coefficients. This isn’t a problem if prediction is your goal. However, in our case, we are interested in understanding the influence of these predictors on our response variable (i.e., where our animals are located in space and time). As a result, we need to avoid multi-collinearity.
We can get a general sense of pairwise problems between predictor variables by looking at correlation values. As a general rule of thumb, values > 0.7 are recognized as being problematic. We can also check the Variance Inflation Factor (VIF) which assess multi-collinearity across ALL predictors, with higher values indicating high collinearity of a predictor with the rest of the set. Some sources indicate a VIF over 10 is problematic, while others set the threshold at 3.0.
# Assess collinearity
cor(nt.rsf.10[,4:10])
## anth_risk fence_dist prirds_dist secrds_dist river_dist
## anth_risk 1.00000000 -0.23977402 -0.06791019 -0.08807937 -0.08386911
## fence_dist -0.23977402 1.00000000 -0.17290914 0.45471668 0.04296435
## prirds_dist -0.06791019 -0.17290914 1.00000000 -0.05631540 0.07893212
## secrds_dist -0.08807937 0.45471668 -0.05631540 1.00000000 -0.03240754
## river_dist -0.08386911 0.04296435 0.07893212 -0.03240754 1.00000000
## waterpts_dist -0.17751671 0.82843441 -0.24307007 0.46106999 0.18954933
## woody_dist 0.08933057 -0.65194831 0.72499895 -0.24227224 -0.03801800
## waterpts_dist woody_dist
## anth_risk -0.1775167 0.08933057
## fence_dist 0.8284344 -0.65194831
## prirds_dist -0.2430701 0.72499895
## secrds_dist 0.4610700 -0.24227224
## river_dist 0.1895493 -0.03801800
## waterpts_dist 1.0000000 -0.57964428
## woody_dist -0.5796443 1.00000000
# The table of correlation values indicates that high levels of correlation exist between fence distance and waterpoint distance (0.84). We should remove one of these variables based on our research objectives. Here, I'll keep fence distance because I am more interested in this question from a management standpoint. Some other correlations are observed between woody distance and waterpoints and woody distance and primary roads.
# Before making any decisions, let's also check the Variance inflation Factor.
vif(as.data.frame(nt.rsf.10[,4:10]))
## Variables VIF
## 1 anth_risk 1.082620
## 2 fence_dist 6.090865
## 3 prirds_dist 3.407422
## 4 secrds_dist 1.377801
## 5 river_dist 1.175473
## 6 waterpts_dist 4.076093
## 7 woody_dist 5.595513
# Let's remove waterpoints and see if doing so is helpful
vif(as.data.frame(nt.rsf.10[,c(4:8,10)]))
## Variables VIF
## 1 anth_risk 1.078080
## 2 fence_dist 3.007529
## 3 prirds_dist 3.238913
## 4 secrds_dist 1.271642
## 5 river_dist 1.037291
## 6 woody_dist 5.346902
# This helps a alot, but woody distance is still quite high (VIF = 5.7). We could make a decision to remove entirely or if we think it could be an important factor (e.g., woody vegetation is likely avoided by wildebeest because of increased risk of predation), we could include it while making sure woody vegetation was not included in the same model with fences and primary roads. We could then evaluate each model separately using AIC. For now, we'll proceed.
Before we move on to final model fitting, we must first address whether we have adequately sampled ‘Availability’. Ultimately, there is a trade-off between statistical robustness and computer processing time. I generally perform this on a single individual (as we are doing here) as evidence that I am appropriately sampling from availability.
Our goal is to determine the optimal sample of availability necessary to achieve stable coefficient estimates. To do so, we must iterate (or “loop”) over different sets of availability, from 1 ‘Availability’ point per ‘Use’ point to 100 ‘Availability’ points per ‘Use’ point. We will use a nested structure, rather than the “for loop” structure that we have demonstrated in previous exercises. For this somewhat complicated process, I think the nested structure is a bit easier. We then do this process multiple times, so that the ‘Available’ points differ and so we can evaluate how much the coefficients change between repetitive model fittings.
# Setup number of available locations to sample during each model fitting
n.frac <- c(1, 5, 20, 50, 100)
# Total available locations to be generated, based on the number of Use points
n.pts <- nrow(nt.trk.rs) * n.frac
# Number of repetitions. This is simulation so that we can evaluate the variability that exists in the coefficients
# For a publication, I would increase the n.rep to 100. Here, 20 is fine.
n.rep <- 20
# Create a table which saves the settings of each scenario
# We then extract the covariates during each repetition from rsf.stack during each run (the points will vary)
# Then, fit a glm model for each available location (1,5,20,50,100) and for each replication (re-sampling)
# Run simulation and store results
# Commented out as the process is time consuming
# **********************************************
# wb.sim <- tibble(
# n.pts = rep(n.pts, n.rep),
# frac = rep(n.frac, n.rep),
# result = map(
# n.pts, ~
# nt.trk.rs %>% random_points(n = .x) %>%
# extract_covariates(rsf.stack) %>%
# mutate(anth_risk = scale(anth_risk),
# woody_dist = scale(woody_dist),
# fence_dist = scale(fence_dist),
# prirds_dist = scale(prirds_dist),
# river_dist = scale(river_dist),
# secrds_dist = scale(secrds_dist),
# waterpts_dist = scale(waterpts_dist)) %>%
# # Fit basic model
# glm(case_ ~ anth_risk +
# woody_dist +
# fence_dist +
# prirds_dist +
# river_dist +
# secrds_dist +
# waterpts_dist,
# data = ., family = binomial(link = "logit")) %>%
# broom::tidy()))
# Save file so don't need to run everytime
# write_rds(wb.sim, file = "Data/nt.sim.rds")
# Read in the result
wb.sim <- read_rds("Data/nt.sim.rds")
# Look at the summary table of results
# This shows that we have 100 rows, summarizing the number of points (n.pts) and the availability fraction (frac)
# We have 100 rows because we have 5 different fractions (n.rep = 5 -> 1,5,20,50,100) and we ran the simulations (n.rep = 20) times. 5 x 20 = 100 rows of results
wb.sim
## # A tibble: 100 × 3
## n.pts frac result
## <dbl> <dbl> <list>
## 1 1181 1 <tibble [8 × 5]>
## 2 5905 5 <tibble [8 × 5]>
## 3 23620 20 <tibble [8 × 5]>
## 4 59050 50 <tibble [8 × 5]>
## 5 118100 100 <tibble [8 × 5]>
## 6 1181 1 <tibble [8 × 5]>
## 7 5905 5 <tibble [8 × 5]>
## 8 23620 20 <tibble [8 × 5]>
## 9 59050 50 <tibble [8 × 5]>
## 10 118100 100 <tibble [8 × 5]>
## # ℹ 90 more rows
# Look at the first simulation result
wb.sim$result[[1]]
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0372 0.0467 -0.797 4.26e- 1
## 2 anth_risk -0.0522 0.0493 -1.06 2.90e- 1
## 3 woody_dist -0.151 0.109 -1.38 1.66e- 1
## 4 fence_dist 0.164 0.130 1.26 2.09e- 1
## 5 prirds_dist -0.939 0.0957 -9.81 1.05e-22
## 6 river_dist 0.261 0.0523 4.99 6.15e- 7
## 7 secrds_dist 0.291 0.0554 5.26 1.47e- 7
## 8 waterpts_dist -0.903 0.109 -8.29 1.16e-16
# Visualize/Graph findings
# We must 'unnest' the contents in each nested dataframe so visualize the coefficient estimates from individual model fits
wb.sim %>% unnest(cols = result) %>%
mutate(term = recode(term,
"(Intercept)" = "Intercept",
anth_risk = "Anthropogenic disturbance",
woody_dist = "Woody Distance",
fence_dist = "Fence Distance",
prirds_dist = "Pri Rds Distance",
river_dist = "River Distance",
secrds_dist = "Sec Rds Distance",
waterpts_dist = "Water Point Distance")) %>%
ggplot(aes(factor(frac),
y = estimate)) +
geom_boxplot() +
facet_wrap(~ term, scale ="free") +
geom_jitter(alpha = 0.2) +
labs(x = "Available Points per Use Location",
y = "Estimate") +
theme_light()
# This is a key tool in making a final assessment of the minimum # of available points needed for each use location. We can see little change in the coefficient values once we have 20 points per use location, but we'll use 50 points per 'Use' location to be safe.
Based on the model fitting simulations, we will generate 50 available locations per use location. More would be better, but there is seemingly little change between model fits as we increase the number of availability locations beyond this threshold.
Here the intercept decreases as the number of available points increases, but the slope parameter estimates, on average, do not change much once we included at least 20 available points per used point. We conclude, at least in this particular case, 20 available points per used point is sufficient for interpreting the slope coefficients. We will follow the same process described above to generate random points, variable extraction, and model fitting. We then plot the parameter responses.
# Generating 50 available locations per each used location within animal homerange
nt.rsf.50 <- random_points(hr,
n = nrow(nt.trk.rs) * 50,
type ="regular",
presence = nt.trk.rs)
# Extracting the covariates
nt.rsf.50 <- nt.rsf.50 %>% extract_covariates(rsf.stack)
# Fit a "full" model after removing waterpoints because of high collinearity. We also should avoid running fence distance or primary roads with woody distance, because they are highly correlated. As a result, let's fit three "full" models and then compare them.
# Fitting Model 1 - No woody distance
M1.NoWood <- glm(case_ ~ scale(anth_risk) +
scale(fence_dist) +
scale(prirds_dist) +
scale(secrds_dist) +
scale(river_dist),
family = binomial(link="logit"),
data = nt.rsf.50)
# Get Summary & calculate profile confidence intervals to see if coefficients overlap zero
summary(M1.NoWood)
##
## Call:
## glm(formula = case_ ~ scale(anth_risk) + scale(fence_dist) +
## scale(prirds_dist) + scale(secrds_dist) + scale(river_dist),
## family = binomial(link = "logit"), data = nt.rsf.50)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.30983 0.04139 -104.125 < 2e-16 ***
## scale(anth_risk) -0.18768 0.03066 -6.121 9.32e-10 ***
## scale(fence_dist) -0.54445 0.04315 -12.616 < 2e-16 ***
## scale(prirds_dist) -0.93501 0.03851 -24.279 < 2e-16 ***
## scale(secrds_dist) 0.30225 0.03957 7.638 2.21e-14 ***
## scale(river_dist) 0.11678 0.02996 3.897 9.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 11626 on 60236 degrees of freedom
## Residual deviance: 10805 on 60231 degrees of freedom
## AIC: 10817
##
## Number of Fisher Scoring iterations: 7
# broom::tidy(M2.Wood) # Could also use the tidy summary, which is a little easier to read
confint(M1.NoWood)
## 2.5 % 97.5 %
## (Intercept) -4.39235743 -4.2300402
## scale(anth_risk) -0.24868705 -0.1284706
## scale(fence_dist) -0.63021441 -0.4609706
## scale(prirds_dist) -1.01134248 -0.8603316
## scale(secrds_dist) 0.22465204 0.3798184
## scale(river_dist) 0.05791692 0.1753952
# Fitting Model 2 - No Fencing or Primary Roads
M2.Wood <- glm(case_ ~ scale(anth_risk) +
scale(secrds_dist) +
scale(river_dist) +
scale(woody_dist),
family = binomial(link="logit"),
data = nt.rsf.50)
# Get Summary & calculate profile confidence intervals to see if coefficients overlap zero
summary(M2.Wood)
##
## Call:
## glm(formula = case_ ~ scale(anth_risk) + scale(secrds_dist) +
## scale(river_dist) + scale(woody_dist), family = binomial(link = "logit"),
## data = nt.rsf.50)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.04114 0.03315 -121.896 < 2e-16 ***
## scale(anth_risk) 0.02132 0.02733 0.780 0.435
## scale(secrds_dist) -0.11458 0.02737 -4.187 2.83e-05 ***
## scale(river_dist) 0.02539 0.02895 0.877 0.380
## scale(woody_dist) -0.54079 0.03274 -16.516 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 11626 on 60236 degrees of freedom
## Residual deviance: 11331 on 60232 degrees of freedom
## AIC: 11341
##
## Number of Fisher Scoring iterations: 7
confint(M2.Wood)
## 2.5 % 97.5 %
## (Intercept) -4.10693028 -3.97695112
## scale(anth_risk) -0.03313973 0.07399680
## scale(secrds_dist) -0.16883587 -0.06152153
## scale(river_dist) -0.03159199 0.08192333
## scale(woody_dist) -0.60529722 -0.47692933
# Compare models with AIC
AIC(M1.NoWood, M2.Wood)
## df AIC
## M1.NoWood 6 10817.1
## M2.Wood 5 11341.0
The AIC summary provides evidence that the M1.NoWood model is superior to our alternative model (M2.Wood). We can make some assessments of the effect of each predictor simply by looking at the sign (positive or negative) of each coefficient in the summary table. But what do these coefficients actually mean?
Because the coefficients are displayed on the scale of log-odds, they
are (or can be) difficult to interpret. To interpret them as odds ratios
(referred to as the “Relative Selection Strength” in the RSF world), we
need to exponentiate (exp()) the coefficients. We will plot
the log-odds and then calculate the Relative Selection Strength of each
predictor variable.
# Plot raw, non-transformed estimates (log-odds)
plot_model(M1.NoWood, transform = NULL)
# This indicates that we have two predictors (river distance and secondary roads) that have positive effects and three variables (anthropogenic risk, fence distance, and primary roads) that have negative effects.
# Specifically, we see that when we increase human risk (1 unit increase), we reduce (negative sign) the relative use probability. This is what we would expect (i.e., more human population leads to less wildebeest). For fences and primary roads, we also see a negative effect. Here, a 1 unit increase in the distance to fences or primary roads results in a negative effect on the relative use probability of wildebeest. That is, relative use probability of wildebeest is higher when closer to these features (a positive association), than further away. This is an unexpected result. We should remember, however, that we are only investigating 1 animal at this time.
# For secondary roads and rivers, we see positive effects (i.e., an increase in the distance to roads and rivers leads to an increase in the relative use probability of wildebeest). That is, relative use probability is higher as we move away from these features (a negative association). This is also unexpected, at least for secondary roads.
# Create an alternative plot of the raw coeeficients (log-odds)
coef_plot <- sjPlot::plot_model(M1.NoWood,
type = "est",
colors = c("cadetblue", "tomato"),
transform = NULL,
title = "Plot of Standardized Coefficents")
# coef_plot
# We can keep the above plot, or improve it using ggplot. Here we'll add a horizontal line, centered on 0, so that we can easily see the positive and negative effects.
coef_plot +
geom_hline(yintercept = 0,
linetype = "dashed",
col = "darkgray") +
theme_classic()
# Check out help(plot_model), as there are a number of customizable arguments. You could, for example, use the "terms" argument to indicate which terms you want to show on the plot and also show the odds ratios instead of the log odds (raw coeffients is the defaul). The odds ratios, however, are harder to interpret when the coefficients are negative, as we'll see below.
# coef_plot.example <- sjPlot::plot_model(M1.NoWood,
# type = "est",
# colors = c("cadetblue", "tomato"),
# terms = c("scale(anth_risk)","scale(fence_dist)"), # Will only plot the predictors specified
# transform = "exp", # Exponentiating to plot the odds ratios
# title = "Plot of Standardized Coefficents")
#
# coef_plot.example
# Now let's convert the log odds to odds ratios to evaluate Relative Selection Strength. We do this by exponentiating the coefficients
my_coefs <- coef(M1.NoWood)
exp(my_coefs)
## (Intercept) scale(anth_risk) scale(fence_dist) scale(prirds_dist)
## 0.01343577 0.82888334 0.58015815 0.39258022
## scale(secrds_dist) scale(river_dist)
## 1.35289927 1.12386749
# Remember: Don't interpret the intercept in RSF models. This is meaningless.
# Any value below 1 has a negative impact on use probability; Any value above 1 is a positive impact. These values can be interpreted as the relative change in the probability of use with 1 unit of change in the predictor. Easy to interpret for predictors with a positive sign.
# For negative coefficients, it's easier to include a negative sign when exponentiating
exp(-my_coefs[2:4])
## scale(anth_risk) scale(fence_dist) scale(prirds_dist)
## 1.206442 1.723668 2.547250
# Thus, for example, we can say a location with a 1 unit increase in anth_risk is 1.20 times LESS likely to be used by this wildebeest. We see the same general patterns for fence distance (1.72 times LESS likely) and primary road distance (2.55 times LESS likely).
# One last IMPORTANT note: We scaled our predictor variables. As a result, a 1 unit change is actually a change in 1 STANDARD DEVIATION. If we did not scale our predictor variables, a 1 unit change in the distance variables would be 1 meter, for example. Just something to be aware of.
We can also visualize the responses of each predictor variables by
holding all other predictor variables constant. The
visreg() function can be used to do this process for us,
making plotting easy. We need to note, however, that the intercept is
included when using this function. The amt
also has a function, logRSS(), to calculate Relative
Selection Strength across a sequence of values. Using this function
gives us flexibility and is more appropriate to visualize the
response.
# Use visreg to visualize
# visreg(M1.NoWood,
# xvar = "anth_risk",
# scale="response",
# ylab="Relative Use Intensity",
# xlab="Anthropogenic risk",
# partial=F,
# rug=F,
# line=list(col="black"),
# fill=list(col="light gray"))
#
# # Let's loop over all the variables and plot them
# # Step 1: Create an object holding the variables
# var1 <- c("anth_risk",
# "fence_dist",
# "prirds_dist",
# "secrds_dist",
# "river_dist")
#
# # Step 2: Create an object of the variable names (for plotting labels)
# var.names <- c("Anthropogenic Risk Index",
# "Fence Distance (m)",
# "Primary Road Distance (m)",
# "Secondary Roads Distance (m)",
# "River Distance (m)")
#
# # Step 3: Set plotting window
# par(mfrow=c(3,2))
#
# # Step 4: Loop over each variable and plot
# for (i in 1:length(var1)){
# visreg(M1.NoWood,
# xvar = var1[i],
# scale="response",
# ylab="Relative Use Intensity",
# xlab=var.names[i],
# partial=F,
# rug=F,
# line=list(col="black"),
# fill=list(col="light gray"))
# }
#
# # Step 5: Reset plotting window
# par(mfrow=c(1,1))
# Use log_rss()
# Step 1: Create a dataframe of a sequency of values to predict (seq(min to max)). Note that I am holding all other variables constant (mean values)
df1 <- data.frame(anth_risk = seq(min(nt.rsf.50$anth_risk),
max(nt.rsf.50$anth_risk),
length.out = 100),
fence_dist = mean(nt.rsf.50$fence_dist),
river_dist = mean(nt.rsf.50$river_dist),
prirds_dist = mean(nt.rsf.50$prirds_dist),
secrds_dist = mean(nt.rsf.50$secrds_dist))
# Step 2: Create a dataframe of what are comparing to. Here, I'm comparing to the mean anth_risk value
df2 <- data.frame(anth_risk = mean(nt.rsf.50$anth_risk),
fence_dist = mean(nt.rsf.50$fence_dist),
river_dist = mean(nt.rsf.50$river_dist),
prirds_dist = mean(nt.rsf.50$prirds_dist),
secrds_dist = mean(nt.rsf.50$secrds_dist))
# Step 3: Use the log_rss() function to predict across our sequence using coefficients from our model
logRSS_riskRange <- log_rss(object = M1.NoWood,
x1 = df1,
x2 = df2)
# Look at result
# logRSS_riskRange
# This output is set up nicely for plotting, since we have our input values of risk, and we have our output values for log-RSS. I'd like to plot the odds-ratio values though (as opposed to the log-odds) so in plotting, we'll take the exp() of those values.
# Step 4: Plot
ggplot(logRSS_riskRange$df,
aes(x = anth_risk_x1,
y = exp(log_rss))) +
geom_line(linewidth = 1) +
geom_hline(yintercept = exp(0),
linetype = "dashed",
color = "gray30") +
xlab("Anthropogenic Risk") +
ylab("RSS vs Mean Risk") +
theme_bw()
# The RSS of 1.0 here crosses the line at the mean value of risk, since that is what this relative value (odds ratio) is being compared to.
The last step is to generate a predictive surface of habitat
suitability. We can’t use Predict() in this case, because
we want to ignore the intercept, but we can do all the same things with
some extra work. Also remember that we must scale the
raster values when making the prediction, since our coefficient
estimates are based on scaled values.
# Extract the coefficient of each predictor from the model summary
coeff <- M1.NoWood$coefficients
# Then, use the logistic equation to generate predictions (no B_0)
# w*(x)=exp(β1x1 + β2x2 +.... + βnxn)/exp(1 + β1x1 + β2x2 +.... + βnxn)
# where w*(x) is the relative probability of selection, dependent upon covariates X1 through Xn, and their estimated regression coefficients β1 to βn, respectively.
# IMPORTANT: Since we scaled our parameters, we MUST also scale our raster layers. The scaled raster layers will look the same, but the range of values being plotted will change. Here, I'm scaling by subtracting the mean value of the parameter and dividing by the standard deviation. This is what we did above by using the scale() function above. See help(scale).
# See how this is written: ScaledRasterLayer = (unscaled raster layer - mean(dataset$predictor)) / sd(dataset$predictor)
anth.scale <- (rsf.stack$anth_risk - mean(nt.rsf.50$anth_risk)) / sd(nt.rsf.50$anth_risk)
fence.scale <- (rsf.stack$fence_dist - mean(nt.rsf.50$fence_dist)) / sd(nt.rsf.50$fence_dist)
prirds.scale <- (rsf.stack$prirds_dist - mean(nt.rsf.50$prirds_dist)) / sd(nt.rsf.50$prirds_dist)
secrds.scale <- (rsf.stack$secrds_dist - mean(nt.rsf.50$secrds_dist)) / sd(nt.rsf.50$secrds_dist)
river.scale <- (rsf.stack$river_dist - mean(nt.rsf.50$river_dist)) / sd(nt.rsf.50$river_dist)
# Prediction - Need to get the coefficient numbers [[i]] correct
pred <- exp(anth.scale*coeff[[2]] +
fence.scale*coeff[[3]] +
prirds.scale*coeff[[4]] +
secrds.scale*coeff[[5]] +
river.scale*coeff[[6]]) /
(1+exp(anth.scale*coeff[[2]] +
fence.scale*coeff[[3]] +
prirds.scale*coeff[[4]] +
secrds.scale*coeff[[5]] +
river.scale*coeff[[6]]))
# Provide Spatial Prediction - Based off of the coefficients from this single animal
plot(pred)
# Let's mask the data with the Athi-Kaputiei Boundary file and then use tmap to plot
# We loaded a spatial polygon data layer at the start of this script named "Athi.Bound"
pred.nt <- mask(pred, Athi.Bound)
# Graph on tmap
tm_shape(pred.nt,
name = "Habitat Suitability") +
tm_raster(palette = "PuBuGn", n = 10,
alpha = 0.6,
title = "Habitat Suitability") +
tm_shape(Athi.Bound,
name = "Athi-Kaputiei Plains") +
tm_polygons(alpha = 0, # Make polygon transparent
border.col = "green") +
tm_layout("Ntishya Example")
# Save this file in your data directory
# writeRaster(pred.nt, filename ="./Output/Prediction.nt.tif", overwrite = TRUE)
More than likely, you will have multiple tagged animals and will be interested in investigating resource use across your entire population (or interested in differences in selection between different populations). This can be accomplished in multiple ways:
The two stage approach follows the same procedure as described above,
except that you would select additional animals and fit a GLM to each
individual. You could then output the summary coefficients and either
apply the mean of your coefficients (e.g.,
mn.coeff.anth = mean(c(model1.coeff.anth, model2.coeff.anth, model3.coeff.anth)))
to the scaled raster layers or create individual predictions and
calculate the mean of the predictive surfaces. Estimating the variance
between animals, however, is very difficult. Model structures must be
exactly the same between individuals in order to make comparisons.
Example of the command to calculate the mean of predictions from multiple animals:
mean.pred <- mean(pred.mask.Animal1,pred.mask.Animal2,pred.mask.Animal3)
Mixed effects models are commonly used to provide estimates of the population. The same process as described above is used, but the model structure is slightly more complicated. Amt provides a nice structure for creating random points, with a simple process for annotating points with underlying environmental layers, linked to each individual.
The dataframe needs to be converted to a list, with a dataframe for
each animal. The lapply command can then be used to make a
track (mk_track) and generate a set of random points
(random_points) for each animal.
# Grab the original dataset (before filtering for Ntishya)
nt <- WB.data.sf %>%
mutate(x = st_coordinates(WB.data.sf)[ ,1],
y = st_coordinates(WB.data.sf)[ ,2]) %>%
as_tibble() %>%
select(-geometry)
# Grab the Ids from the file
WB.Id <- unique(nt$id)
# Creating a list that contain the dataframe for each individual
# Splitting by id
nt <- split(nt,nt$id)
# Creating track for each individual animal in the list
# Note, we've again included time, but we're not using it in the model structure
WB.trk.all <- lapply(nt, function(x) mk_track(x,
.x=x,
.y=y,
.t=date,
crs = crs(rsf.stack),
order_by_ts = T,
id = id,
sex = sex))
# Let's reduce the file of each id subset, selecting only 20% of each
WB.trk.all <- lapply(WB.trk.all,
function(x) x[sample(1:nrow(x),
size = ceiling(nrow(x)*0.20)),c("x_","y_","t_","id","sex")])
# Generate 50 random points for each animal
# Using a list apply here to apply the function
# This will create the case_ column
WB.Athi <- lapply(WB.trk.all,
function(x) random_points(x,
n=nrow(x) * 50,
typ="regular"))
We can then extract the relevant base layer information at each data
point, similar to above, but by applying the
extract_covariates to the list.
# Extract all covariates
WB.Athi <- lapply(WB.Athi, function(x) extract_covariates(x, rsf.stack))
# Note, we lost the ID column, so need to put that back in (that's why we created above). We'll use this as the random effect.
WB.Athi <- mapply(cbind, WB.Athi, "Animal_ID" = WB.Id, SIMPLIFY=F)
# Binding all dataframes in the list into a single dataframe for analysis
WB.Athi <- do.call(rbind,WB.Athi)
The model structure looks similar to what we have already done
together, except we are now adding a random effect (1|Animal). We’ll
also include random SLOPES on all the predictor variables, allowing them
to vary with each individual. This is a reasonably assumption given the
variability that exists between individuals within the same population.
Fitting random effects, however, means that these models are becoming
increasingly complicated and can take a long time to run. A proper
procedure would be to test multiple models and use AIC to determine the
best fit (function AICtab). Here, we’ll simply test one
complicated model so that we see how the standard errors are impacted
from our initial GLM structure.
# This model will be slow to execute
# mixed.model <- glmer(case_ ~ scale(anth_risk) +
# scale(fence_dist) +
# scale(prirds_dist) +
# scale(secrds_dist) +
# scale(river_dist) +
# (1 + scale(anth_risk) +
# scale(fence_dist) +
# scale(prirds_dist) +
# scale(secrds_dist) +
# scale(river_dist)
# |Animal_ID),
# family = binomial(link = "logit"),
# data = WB.Athi)
# Save/Load the model
# saveRDS(mixed.model, file = "Output/MixedModel.rds")
# Data load/import
mixed.model <- read_rds("Output/MixedModel.rds")
# Print Model Summary
summary(mixed.model)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: case_ ~ scale(anth_risk) + scale(fence_dist) + scale(prirds_dist) +
## scale(secrds_dist) + scale(river_dist) + (1 + scale(anth_risk) +
## scale(fence_dist) + scale(prirds_dist) + scale(secrds_dist) +
## scale(river_dist) | Animal_ID)
## Data: WB.Athi
##
## AIC BIC logLik deviance df.resid
## 78608.3 78909.0 -39277.1 78554.3 508790
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1 0 0 0 1518766
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Animal_ID (Intercept) 2.2161 1.4887
## scale(anth_risk) 0.1677 0.4095 -0.10
## scale(fence_dist) 1.5904 1.2611 -0.34 -0.02
## scale(prirds_dist) 4.5267 2.1276 -0.05 0.12 0.61
## scale(secrds_dist) 3.4113 1.8470 0.38 -0.24 -0.19 -0.71
## scale(river_dist) 3.5176 1.8755 0.27 -0.56 -0.34 -0.63 0.80
## Number of obs: 508817, groups: Animal_ID, 12
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.3986 0.4076 -13.245 < 2e-16 ***
## scale(anth_risk) -0.5399 0.1141 -4.732 2.22e-06 ***
## scale(fence_dist) -0.6826 0.3251 -2.100 0.0357 *
## scale(prirds_dist) -1.6274 0.5095 -3.194 0.0014 **
## scale(secrds_dist) -0.1770 0.4710 -0.376 0.7071
## scale(river_dist) 0.2315 0.4386 0.528 0.5977
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) scl(n_) scl(f_) scl(p_) scl(s_)
## scl(nth_rs) -0.054
## scl(fnc_ds) -0.257 -0.088
## scl(prrds_) 0.052 0.048 0.475
## scl(scrds_) 0.340 -0.148 0.027 -0.630
## scl(rvr_ds) 0.193 -0.518 -0.166 -0.525 0.729
# Look at the random effects
ranef(mixed.model)
## $Animal_ID
## (Intercept) scale(anth_risk) scale(fence_dist) scale(prirds_dist)
## Karbolo -2.5235089 -0.18812129 2.34838949 -0.8602387
## Kikaya -1.3003323 -0.07944210 0.52062624 -0.9847296
## Kiranto -2.0320640 0.75148606 -0.94222157 0.4477869
## Laingoni 0.1936768 -0.55979929 -0.63469866 -0.5847908
## Ne Mpaash 1.6954577 -0.36658101 -3.07282212 -5.7280363
## Noontare -0.8516896 -0.62685967 0.45284839 3.2958350
## Ntishya 1.2869236 0.12807899 0.29114450 0.8275822
## Ole Nagol -1.3726671 0.39439216 -0.37733430 -0.9267501
## Paita 1.1005878 0.44596674 0.18144029 1.5873469
## Peria 1.6384499 0.33324203 -0.11786805 0.3468489
## Sawani 1.2327873 -0.09814858 1.29305798 1.3398742
## Sotua 1.0866623 -0.11385910 0.09267562 1.2976688
## scale(secrds_dist) scale(river_dist)
## Karbolo 2.0563582 1.9621798
## Kikaya -0.4299970 -1.9139726
## Kiranto -3.2917346 -2.3781533
## Laingoni -0.8173689 0.3471248
## Ne Mpaash 4.2769052 5.0091024
## Noontare -2.2230148 0.1025360
## Ntishya 0.3899298 -0.4029089
## Ole Nagol -0.2523859 -1.0498456
## Paita 0.5711507 -0.2116802
## Peria 0.3200733 -1.4526135
## Sawani 0.2257012 -0.1335716
## Sotua -0.7958649 0.1174268
##
## with conditional variances for "Animal_ID"
Not surprisingly, this model is a far better fit to the data. The biggest thing to note is the change in the standard error estimates for each parameter included in the model. We also are presented with two “pseudo-r2” terms. The conditional R2 is always higher, and reflects the model’s ability to explain variance in the response when it has information about the level of the random effect. Large differences in these two r2 values indicate that there are large differences in predicted values across the levels of our random effect.
Similar to earlier exercises, let’s graph and interpret results.
# Plot coefficients and response curves for inference at the population level
plot_model(mixed.model,transform = NULL)
# We can see that the coefficient plot works the same, and it just shows the fixed effects. But we can see the huge impact it has on the variance estimation if we allow for random effects.
# Exponentiate the coefficients and plot the odds ratios
#tab_model(mixed.model)
# Graph response curves. Use effect plot from jtools. We could do this for all our parameters.
effect_plot(mixed.model,
data = WB.Athi,
pred = anth_risk,
interval = TRUE,
x.label = "Anthropogenic risk",
y.label = "Relative Selection Strength") +
theme_classic()
We can now generate a predictive surface from the new coeffients,
following the same procedure as described previously. Alternatively, we
could use the predict function, but would then need to
subtract the intercept term from the model and exponentiate the result
(my_prediction2 <- exp(my_prediction - my_coef[[1]])).
# Extract the coefficient of each predictor from the model summary
coef <- summary(mixed.model)
coeff <- coef$coefficients
# Scale rasters. Need to do this again because the sample used is different than previously calculated:
anth.scale <- (rsf.stack$anth_risk - mean(WB.Athi$anth_risk)) / sd(WB.Athi$anth_risk)
fence.scale <- (rsf.stack$fence_dist - mean(WB.Athi$fence_dist)) / sd(WB.Athi$fence_dist)
prirds.scale <- (rsf.stack$prirds_dist - mean(WB.Athi$prirds_dist)) / sd(WB.Athi$prirds_dist)
secrds.scale <- (rsf.stack$secrds_dist - mean(WB.Athi$secrds_dist)) / sd(WB.Athi$secrds_dist)
river.scale <- (rsf.stack$river_dist - mean(WB.Athi$river_dist)) / sd(WB.Athi$river_dist)
# Prediction
pred.mixed <- exp(anth.scale*coeff[[2]] +
fence.scale*coeff[[3]] +
prirds.scale*coeff[[4]] +
secrds.scale*coeff[[5]] +
river.scale*coeff[[6]]) /
(1+exp(anth.scale*coeff[[2]] +
fence.scale*coeff[[3]] +
prirds.scale*coeff[[4]] +
secrds.scale*coeff[[5]] +
river.scale*coeff[[6]]))
# Provide Spatial Prediction - Based off of the coefficients from this single animal
plot(pred.mixed)
# Mask result and plot using tmap with the Athi Bounday
pred.mixed <- mask(pred.mixed, Athi.Bound)
# Graph on tmap
tm_shape(pred.mixed,
name = "Habitat Suitability") +
tm_raster(palette = "PuBuGn", n = 10,
alpha = 0.6,
title = "Habitat Suitability") +
tm_shape(Athi.Bound,
name = "Athi-Kaputiei Plains") +
tm_polygons(alpha = 0, # Make polygon transparent
border.col = "green") +
tm_layout("Random Effects Example")
# Save this file in your data directory
writeRaster(pred.nt, filename ="Output/Prediction.mixed.tif", overwrite = TRUE)
These results represent a significant improvement to our initial GLM model. We thinned the data in an ad-hoc fashion which is not desirable. We also haven’t included any polynomical or interactive effects. More appropriate would be to use all the data that we fought hard to collect and account for the autocorrelation structure inherent in the data. We can do this in two ways: